Exploring Evictions and Code Violations in Philadelphia

In this assignment, I'll explore spatial trends evictions in Philadelphia using data from the Eviction Lab and building code violations using data from OpenDataPhilly.

I'll be exploring the idea that evictions can occur as retaliation against renters for reporting code violations. Spatial correlations between evictions and code violations from the City's Licenses and Inspections department can offer some insight into this question.

A couple of interesting background readings:

1 Explore Eviction Lab Data

The Eviction Lab built the first national database for evictions. For more information for the project, you can explore their website: https://evictionlab.org/. Understanding the evisction data will allow us to find solutions to ensure human rights, social equity, and better resource allocations.

1.1 Read data using geopandas

The first step is to read the eviction data by census tract using geopandas. The data for all of Pennsylvania by census tract can be downloaded in a GeoJSON format using the following url:

https://eviction-lab-data-downloads.s3.amazonaws.com/PA/tracts.geojson

A browser-friendly version of the data is available here: https://data-downloads.evictionlab.org/

In [1]:
import numpy as np
import pandas as pd
import geopandas as gpd
import hvplot.pandas
import holoviews as hv
import cartopy.crs as ccrs
from matplotlib import pyplot as plt
%matplotlib inline
In [2]:
tracts = gpd.read_file('https://eviction-lab-data-downloads.s3.amazonaws.com/PA/tracts.geojson')
In [3]:
tracts.head()
Out[3]:
GEOID west south east north n pl p-00 pr-00 roh-00 ... pm-16 po-16 ef-16 e-16 er-16 efr-16 lf-16 imputed-16 subbed-16 geometry
0 42003412002 -80.1243 40.5422 -80.0640 40.5890 4120.02 Allegheny County, Pennsylvania 4748.59 0.88 58.0 ... 0.00 0.0 0.0 0.0 0.00 0.00 1.0 0.0 1.0 (POLYGON ((-80.06670099999999 40.584012, -80.0...
1 42003413100 -80.0681 40.5850 -79.9906 40.6143 4131 Allegheny County, Pennsylvania 6771.01 3.47 729.0 ... 1.59 0.0 12.0 2.0 0.27 1.62 1.0 0.0 1.0 (POLYGON ((-80.068057 40.612536, -80.054520999...
2 42003413300 -80.0657 40.5527 -80.0210 40.5721 4133 Allegheny County, Pennsylvania 5044.59 2.99 119.0 ... 0.95 0.0 4.0 1.0 0.49 1.96 1.0 0.0 1.0 (POLYGON ((-80.03821600000001 40.553495, -80.0...
3 42003416000 -79.8113 40.5440 -79.7637 40.5630 4160 Allegheny County, Pennsylvania 1775.93 4.99 121.0 ... 0.55 0.0 1.0 1.0 0.65 0.65 1.0 0.0 1.0 (POLYGON ((-79.765946 40.550915, -79.765415 40...
4 42003417200 -79.7948 40.5341 -79.7642 40.5443 4172 Allegheny County, Pennsylvania 1428.03 11.95 321.0 ... 0.00 0.0 7.0 3.0 0.82 1.90 1.0 0.0 1.0 (POLYGON ((-79.771137 40.544153, -79.764172 40...

5 rows × 399 columns

In [4]:
print(len(tracts))
3217

1.2 Explore and trim the data

We will need to trim data to Philadelphia only. Take a look at the data dictionary for the descriptions of the various columns: https://eviction-lab-data-downloads.s3.amazonaws.com/DATA_DICTIONARY.txt

Note: the column names are shortened — see the end of the above file for the abbreviations. The numbers at the end of the columns indicate the years. For example, e-16 is the number of evictions in 2016.

Take a look at the individual columns and trim to census tracts in Philadelphia. (Hint: Philadelphia is both a city and a county).

In [5]:
tracts_trim = tracts.loc[tracts['pl'] == 'Philadelphia County, Pennsylvania']
tracts_trim.head()
Out[5]:
GEOID west south east north n pl p-00 pr-00 roh-00 ... pm-16 po-16 ef-16 e-16 er-16 efr-16 lf-16 imputed-16 subbed-16 geometry
435 42101000100 -75.1523 39.9481 -75.1415 39.9569 1 Philadelphia County, Pennsylvania 2646.71 9.26 1347.0 ... 2.49 0.00 25.0 16.0 0.93 1.45 0.0 0.0 1.0 (POLYGON ((-75.14160699999999 39.955491, -75.1...
436 42101000200 -75.1631 39.9529 -75.1511 39.9578 2 Philadelphia County, Pennsylvania 1362.00 56.42 374.0 ... 2.27 0.00 11.0 8.0 0.95 1.30 0.0 0.0 1.0 (POLYGON ((-75.151223 39.956862, -75.151669 39...
437 42101000300 -75.1798 39.9544 -75.1623 39.9599 3 Philadelphia County, Pennsylvania 2570.00 12.16 861.0 ... 1.76 0.00 26.0 14.0 0.73 1.35 0.0 0.0 1.0 (POLYGON ((-75.162339 39.957825, -75.162374 39...
438 42101000801 -75.1833 39.9486 -75.1773 39.9515 8.01 Philadelphia County, Pennsylvania 1478.00 14.40 810.0 ... 1.42 3.78 13.0 4.0 0.51 1.64 0.0 0.0 1.0 (POLYGON ((-75.177323 39.950964, -75.177843 39...
439 42101000804 -75.1712 39.9470 -75.1643 39.9501 8.04 Philadelphia County, Pennsylvania 3301.00 14.40 2058.0 ... 0.19 0.35 22.0 7.0 0.33 1.04 0.0 0.0 1.0 (POLYGON ((-75.17118000000001 39.947784, -75.1...

5 rows × 399 columns

In [6]:
print(len(tracts_trim))
384
In [7]:
tracts_trim.crs #check crs
Out[7]:
{'init': 'epsg:4326'}

1.3 Transform from wide to tidy format

For this assignment, we are interested in the number of evictions by census tract for various years. Right now, each year has it's own column, so it will be easiest to transform to a tidy format.

Use the pd.melt() function to transform the eviction data into tidy format, using the number of evictions from 2003 to 2016.

The tidy data frame should have four columns: GEOID, geometry, a column holding the number of evictions, and a column telling you what the name of the original column was for that value.

In [8]:
tracts_trim_melt = pd.melt(
    tracts_trim,
    id_vars=['GEOID','geometry'],
    value_vars=['e-{:02d}'.format(x) for x in range(3, 17)],
    value_name='number of evictions',
    var_name='year'
)
In [9]:
#change the year column to numbers (replace 'e-' to '20' and change to integers)
tracts_trim_melt['year'] = pd.to_numeric(tracts_trim_melt['year'].replace('e-', '20', regex=True))
In [10]:
tracts_trim_melt.head()
Out[10]:
GEOID geometry year number of evictions
0 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... 2003 21.0
1 42101000200 (POLYGON ((-75.151223 39.956862, -75.151669 39... 2003 3.0
2 42101000300 (POLYGON ((-75.162339 39.957825, -75.162374 39... 2003 17.0
3 42101000801 (POLYGON ((-75.177323 39.950964, -75.177843 39... 2003 13.0
4 42101000804 (POLYGON ((-75.17118000000001 39.947784, -75.1... 2003 21.0
In [11]:
#assign crs
tracts_trim_melt.crs = {'init': 'epsg:4326'}
In [12]:
#check data type
type(tracts_trim_melt['year'][0])
Out[12]:
numpy.int64

1.4 Plot the total number of evictions per year from 2003 to 2016

Use hvplot to plot the total number of evictions from 2003 to 2016. First, I performed a group by operation and sum up the total number of evictions for all census tracts, and then use hvplot() to make your plot.

In [13]:
groupbyyear = tracts_trim_melt.groupby('year')['number of evictions'].sum()
groupbyyear.head()
Out[13]:
year
2003    10647.0
2004    10491.0
2005    10550.0
2006    11078.0
2007    11032.0
Name: number of evictions, dtype: float64
In [14]:
trendplot = groupbyyear.hvplot(kind='line')
trendplot
Out[14]:

By looking at the number of evictions over time, I found that there are highs and lows for eviction volumn, but overall, the volumn stays at a relatively constant level. There might be cyclicle pattern for total number of evictions. THis might relate to the economic activities or other external factors.

1.5 The number of evictions across Philadelphia

Our tidy data frame is still a GeoDataFrame with a geometry column, so I can visualize the number of evictions for all census tracts.

Use hvplot() to generate a choropleth showing the number of evictions for a specified year, with a widget dropdown to select a given year (or variable name, e.g., e-16, e-15, etc).

In [15]:
by_year = tracts_trim_melt.hvplot(c='number of evictions', 
                                  groupby='year', 
                                  width=600, 
                                  height=550, 
                                  geo=True, 
                                  dynamic=False).options(cmap='Magma')
by_year 
C:\Users\HanyongXu\Anaconda3\envs\musa-620\lib\site-packages\datashader\transfer_functions.py:21: FutureWarning: xarray subclass Image should explicitly define __slots__
  class Image(xr.DataArray):
Out[15]:

High eviction rates tend to happen in West Philly, North Philly, and Northeastern Phily. Tracts in Southern Philly appears to have decreasing eviction numbers compared to other tracts in the city.

2 Code Violations in Philadelphia

Next, we'll explore data for code violations from the Licenses and Inspections Department of Philadelphia to look for potential correlations with the number of evictions.

2.1 Load data from 2012 to 2016

L+I violation data for years including 2012 through 2016 (inclusive) is provided in a CSV format in the "data/" folder.

Load the data using pandas and convert to a GeoDataFrame.

In [16]:
violation = pd.read_csv("./data/li_violations.csv")
violation.head()
Out[16]:
lat lng violationdescription
0 40.050526 -75.126076 CLIP VIOLATION NOTICE
1 40.050593 -75.126578 LICENSE-CHANGE OF ADDRESS
2 40.050593 -75.126578 LICENSE-RES SFD/2FD
3 39.991994 -75.128895 EXT A-CLEAN WEEDS/PLANTS
4 40.023260 -75.164848 EXT A-VACANT LOT CLEAN/MAINTAI
In [17]:
#drop nan values of the coordinates
violation = violation.dropna(subset=['lat', 'lng'])
In [18]:
#convert to 4326 crs
violation['Coordinates'] = list(zip(violation['lng'], violation['lat']))
from shapely.geometry import Point
violation['Coordinates'] = violation['Coordinates'].apply(Point)
violation = gpd.GeoDataFrame(violation, geometry="Coordinates", crs={"init": "epsg:4326"})

2.2 Trim to specific violation types

There are many different types of code violations (running the nunique() function on the violationdescription column will extract all of the unique ones). More information on different types of violations can be found on the City's website.

Below, I've selected 15 types of violations that deal with property maintenance and licensing issues. We'll focus on these violations. The goal is to see if these kinds of violations are correlated spatially with the number of evictions in a given area.

In [19]:
violation_types = ['INT-PLMBG MAINT FIXTURES-RES',
 'INT S-CEILING REPAIR/MAINT SAN',
 'PLUMBING SYSTEMS-GENERAL',
 'CO DETECTOR NEEDED',
 'INTERIOR SURFACES',
 'EXT S-ROOF REPAIR',
 'ELEC-RECEPTABLE DEFECTIVE-RES',
 'INT S-FLOOR REPAIR',
 'DRAINAGE-MAIN DRAIN REPAIR-RES',
 'DRAINAGE-DOWNSPOUT REPR/REPLC',
 'LIGHT FIXTURE DEFECTIVE-RES',
 'LICENSE-RES SFD/2FD',
 'ELECTRICAL -HAZARD',
 'VACANT PROPERTIES-GENERAL',
 'INT-PLMBG FIXTURES-RES']
In [20]:
trim_violation = violation[violation['violationdescription'].isin(violation_types)]
trim_violation
Out[20]:
lat lng violationdescription Coordinates
2 40.050593 -75.126578 LICENSE-RES SFD/2FD POINT (-75.12657800000001 40.050593)
25 40.022406 -75.121872 EXT S-ROOF REPAIR POINT (-75.121872 40.022406)
30 40.023237 -75.121726 CO DETECTOR NEEDED POINT (-75.121726 40.023237)
31 40.023397 -75.122241 INT S-CEILING REPAIR/MAINT SAN POINT (-75.122241 40.023397)
34 40.023773 -75.121603 INT S-FLOOR REPAIR POINT (-75.12160300000001 40.023773)
... ... ... ... ...
433982 39.962287 -75.226644 CO DETECTOR NEEDED POINT (-75.22664399999999 39.962287)
433985 39.968669 -75.212576 CO DETECTOR NEEDED POINT (-75.212576 39.968669)
434013 39.950209 -75.227244 INT S-CEILING REPAIR/MAINT SAN POINT (-75.227244 39.950209)
434043 39.936179 -75.192078 INT S-FLOOR REPAIR POINT (-75.19207800000001 39.936179)
434046 40.012805 -75.155963 ELECTRICAL -HAZARD POINT (-75.155963 40.012805)

34108 rows × 4 columns

2.3 Make a hex bin map

The code violation data is point data. We can get a quick look at the geographic distribution using matplotlib and the hexbin() function. Make a hex bin map of the code violations and overlay the census tract outlines.

In [21]:
#convert to web mercator crs
crs_wm = {'init': 'epsg:3857'}
tracts_trim_melt = gpd.GeoDataFrame(tracts_trim_melt, geometry='geometry', crs=crs_wm)
trim_violation = gpd.GeoDataFrame(trim_violation, geometry="Coordinates", crs=crs_wm)
In [22]:
fig, ax = plt.subplots(figsize=(12.5, 11))
vals = ax.hexbin(trim_violation.geometry.x, trim_violation.geometry.y, gridsize=80, cmap='magma', mincnt=1) 
#I have set mincnt =1 so that the non-values are not colored

# add the tract geometry boundaries
tracts_trim_melt.plot(ax=ax, facecolor="none", edgecolor="grey", linewidth=0.2)
# add a colorbar and format
plt.colorbar(vals)
ax.set_axis_off()
ax.set_aspect("auto")

Visually the code violation points do coincide with the clusters for eviction data.

2.4 Spatially join data sets

To do a census tract comparison to the eviction data, I need to find which census tract each of the code violations falls into. Use the geopandas.sjoin() function to do just that.

In [23]:
#check if the crs of the two data frame is the same
trim_violation.crs
Out[23]:
{'init': 'epsg:3857'}
In [24]:
tracts_trim_melt.crs
Out[24]:
{'init': 'epsg:3857'}
In [25]:
#drop the eviction rows, only keep the tracts info
tracts_geo = tracts_trim_melt.drop(['year','number of evictions'],axis = 1)
In [26]:
#drop duplicate rows, keep the unique tracts
tracts_geo_u = tracts_geo.drop_duplicates(subset = 'GEOID', keep = 'first', inplace = False) 
tracts_geo_u.head()
Out[26]:
GEOID geometry
0 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1...
1 42101000200 (POLYGON ((-75.151223 39.956862, -75.151669 39...
2 42101000300 (POLYGON ((-75.162339 39.957825, -75.162374 39...
3 42101000801 (POLYGON ((-75.177323 39.950964, -75.177843 39...
4 42101000804 (POLYGON ((-75.17118000000001 39.947784, -75.1...
In [29]:
#spatial join
joined = gpd.sjoin(trim_violation, tracts_geo_u, op='within', how='left')
In [30]:
joined
Out[30]:
lat lng violationdescription Coordinates index_right GEOID
2 40.050593 -75.126578 LICENSE-RES SFD/2FD POINT (-75.12657800000001 40.050593) 364 42101027100
25 40.022406 -75.121872 EXT S-ROOF REPAIR POINT (-75.121872 40.022406) 81 42101028800
30 40.023237 -75.121726 CO DETECTOR NEEDED POINT (-75.121726 40.023237) 81 42101028800
31 40.023397 -75.122241 INT S-CEILING REPAIR/MAINT SAN POINT (-75.122241 40.023397) 81 42101028800
34 40.023773 -75.121603 INT S-FLOOR REPAIR POINT (-75.12160300000001 40.023773) 81 42101028800
... ... ... ... ... ... ...
433982 39.962287 -75.226644 CO DETECTOR NEEDED POINT (-75.22664399999999 39.962287) 234 42101009300
433985 39.968669 -75.212576 CO DETECTOR NEEDED POINT (-75.212576 39.968669) 270 42101010500
434013 39.950209 -75.227244 INT S-CEILING REPAIR/MAINT SAN POINT (-75.227244 39.950209) 220 42101008000
434043 39.936179 -75.192078 INT S-FLOOR REPAIR POINT (-75.19207800000001 39.936179) 188 42101003300
434046 40.012805 -75.155963 ELECTRICAL -HAZARD POINT (-75.155963 40.012805) 265 42101020102

34108 rows × 6 columns

2.5 Calculate the number of violations by type per census tract

Next, we'll want to find the number of violations (for each kind) per census tract. I grouped the data frame by violation type and census tract name.

The result of this step should be a data frame with three columns: violationdescription, GEOID, and N, where N is the number of violations of that kind in the specified census tract.

In [31]:
joined_sum = joined.groupby(['violationdescription','GEOID']).size().unstack(fill_value=0).stack().reset_index(name='N')
joined_sum
Out[31]:
violationdescription GEOID N
0 CO DETECTOR NEEDED 42101000100 0
1 CO DETECTOR NEEDED 42101000200 0
2 CO DETECTOR NEEDED 42101000300 0
3 CO DETECTOR NEEDED 42101000401 1
4 CO DETECTOR NEEDED 42101000402 1
... ... ... ...
5530 VACANT PROPERTIES-GENERAL 42101980100 0
5531 VACANT PROPERTIES-GENERAL 42101980500 0
5532 VACANT PROPERTIES-GENERAL 42101980700 0
5533 VACANT PROPERTIES-GENERAL 42101980800 0
5534 VACANT PROPERTIES-GENERAL 42101989100 0

5535 rows × 3 columns

2.6 Merge with census tracts geometries

We now have the number of violations of different types per census tract specified as a regular DataFrame. I can now merge it with the census tract geometries (from your eviction data GeoDataFrame) to create a GeoDataFrame.

In [32]:
violation_join = tracts_geo_u.merge(joined_sum, on='GEOID')
In [33]:
violation_join
Out[33]:
GEOID geometry violationdescription N
0 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... CO DETECTOR NEEDED 0
1 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... DRAINAGE-DOWNSPOUT REPR/REPLC 6
2 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... DRAINAGE-MAIN DRAIN REPAIR-RES 0
3 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... ELEC-RECEPTABLE DEFECTIVE-RES 0
4 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... ELECTRICAL -HAZARD 1
... ... ... ... ...
5530 42101018400 (POLYGON ((-75.059017 39.992512, -75.059542562... INTERIOR SURFACES 2
5531 42101018400 (POLYGON ((-75.059017 39.992512, -75.059542562... LICENSE-RES SFD/2FD 11
5532 42101018400 (POLYGON ((-75.059017 39.992512, -75.059542562... LIGHT FIXTURE DEFECTIVE-RES 0
5533 42101018400 (POLYGON ((-75.059017 39.992512, -75.059542562... PLUMBING SYSTEMS-GENERAL 2
5534 42101018400 (POLYGON ((-75.059017 39.992512, -75.059542562... VACANT PROPERTIES-GENERAL 0

5535 rows × 4 columns

2.7 Interactive choropleths for each violation type

Now, we can use hvplot() to create an interactive choropleth for each violation type and add a widget to specify different violation types.

In [34]:
violation_plot = violation_join.hvplot(c='N', 
                                       groupby='violationdescription', 
                                       width=600, 
                                       height=550, 
                                       geo=True, 
                                       dynamic=False).options(cmap='Magma')
violation_plot
Out[34]:

3. A side-by-side comparison

From the interactive maps of evictions and violations, I found there are a lot of spatial overlap.

As a final step, I'll make a side-by-side comparison to better show the spatial correlations. This will involve a few steps:

  1. Trim the data frame plotted in section 1.5 to only include evictions from 2016.
  2. Trim the data frame plotted in section 2.7 to only include a single violation type (pick 'LICENSE-RES SFD/2FD').
  3. Use hvplot() to make two interactive choropleth maps, one for the data from step 1. and one for the data in step 2.
  4. Show these two plots side by side (one row and 2 columns) using the syntax for combining charts.
In [35]:
#trim eviction data to include only 2016 data
eviction_2016 = tracts_trim_melt[tracts_trim_melt['year']==2016]
eviction_2016
Out[35]:
GEOID geometry year number of evictions
4992 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... 2016 16.0
4993 42101000200 (POLYGON ((-75.151223 39.956862, -75.151669 39... 2016 8.0
4994 42101000300 (POLYGON ((-75.162339 39.957825, -75.162374 39... 2016 14.0
4995 42101000801 (POLYGON ((-75.177323 39.950964, -75.177843 39... 2016 4.0
4996 42101000804 (POLYGON ((-75.17118000000001 39.947784, -75.1... 2016 7.0
... ... ... ... ...
5371 42101017800 (POLYGON ((-75.113387 39.996493, -75.111369 39... 2016 104.0
5372 42101017900 (POLYGON ((-75.105915 39.988037, -75.108358999... 2016 80.0
5373 42101018002 (POLYGON ((-75.105064 39.987073, -75.104365 39... 2016 32.0
5374 42101018300 (POLYGON ((-75.06581466910771 39.9862895182783... 2016 7.0
5375 42101018400 (POLYGON ((-75.059017 39.992512, -75.059542562... 2016 2.0

384 rows × 4 columns

In [36]:
#choose 'LICENSE-RES SFD/2FD' as the violation type of interest
license_res = violation_join[violation_join['violationdescription']=='LICENSE-RES SFD/2FD']
license_res
Out[36]:
GEOID geometry violationdescription N
11 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... LICENSE-RES SFD/2FD 4
26 42101000200 (POLYGON ((-75.151223 39.956862, -75.151669 39... LICENSE-RES SFD/2FD 0
41 42101000300 (POLYGON ((-75.162339 39.957825, -75.162374 39... LICENSE-RES SFD/2FD 0
56 42101000801 (POLYGON ((-75.177323 39.950964, -75.177843 39... LICENSE-RES SFD/2FD 0
71 42101000804 (POLYGON ((-75.17118000000001 39.947784, -75.1... LICENSE-RES SFD/2FD 1
... ... ... ... ...
5471 42101017800 (POLYGON ((-75.113387 39.996493, -75.111369 39... LICENSE-RES SFD/2FD 86
5486 42101017900 (POLYGON ((-75.105915 39.988037, -75.108358999... LICENSE-RES SFD/2FD 106
5501 42101018002 (POLYGON ((-75.105064 39.987073, -75.104365 39... LICENSE-RES SFD/2FD 26
5516 42101018300 (POLYGON ((-75.06581466910771 39.9862895182783... LICENSE-RES SFD/2FD 15
5531 42101018400 (POLYGON ((-75.059017 39.992512, -75.059542562... LICENSE-RES SFD/2FD 11

369 rows × 4 columns

In [37]:
eviction_2016_plot = eviction_2016.hvplot(c='number of evictions', 
                                          width=490, 
                                          height=470, 
                                          geo=True, 
                                          dynamic=False).options(cmap='Magma').relabel('Evictions in 2016')
license_res_plot = license_res.hvplot(c='N', 
                                      width=490, 
                                      height=470, 
                                      geo=True, 
                                      dynamic=False).options(cmap='Magma').relabel('LICENSE-RES SFD/2FD Violations')
combined = eviction_2016_plot+license_res_plot
combined
Out[37]:

Significant correlations can be found in Upper Eastern Philly and West Philly.

4. Further Analysis

Identify the 20 most common types of violations within the time period of 2012 to 2016 and create a set of interactive choropleths similar to what was done in section 2.7.

Use this set of maps to identify 3 types of violations that don't seem to have much spatial overlap with the number of evictions in the City.

In [38]:
#convert to web mercator crs
violation = gpd.GeoDataFrame(violation, geometry="Coordinates", crs=crs_wm)
violation.crs
Out[38]:
{'init': 'epsg:3857'}
In [39]:
#group and count the number of violations of each type
violation_type = violation.groupby('violationdescription').size()
violation_type.head()
Out[39]:
violationdescription
ADMINISTRATIVE -ID BARRICADES    13
ADULT BOOK SALE/STORAGE           1
ADVERTISING SIGN BUSINESS LO      1
ADVERTISING SIGN BUSINESS LR      1
AFCI RECEPTACLE REQ'D            59
dtype: int64
In [40]:
#rank and find the top 20 most frequent violation types
violation_type = violation_type.sort_values(ascending=False)
violation_top20 = violation_type.iloc[:20].reset_index()
violation_top20
Out[40]:
violationdescription 0
0 CLIP VIOLATION NOTICE 64811
1 EXT A-VACANT LOT CLEAN/MAINTAI 32633
2 HIGH WEEDS-CUT 20504
3 LICENSE-VAC RES BLDG 15278
4 VACANT PROP STANDARD 12283
5 RUBBISH/GARBAGE EXTERIOR-OWNER 11416
6 EXT A-CLEAN RUBBISH/GARBAGE 8832
7 LICENSE-RES SFD/2FD 8179
8 EXT A-CLEAN WEEDS/PLANTS 7904
9 LICENSE-RES GENERAL 7682
10 VACANT BLDG UNSECURED COUNT 6614
11 INT S-CEILING REPAIR/MAINT SAN 5146
12 VIOL C&I MESSAGE 5031
13 CO DETECTOR NEEDED 4934
14 ANNUAL CERT FIRE ALARM 4611
15 LICENSE - RENTAL PROPERTY 4226
16 VAC PROP REPLAC WIN/DRS 80% 4147
17 SD-REQD EXIST GROUP R 4052
18 PERM Z- NEW USE 3863
19 INT S-WALLS REPAIR/MAINT SANI 3396
In [41]:
violation_top20_f = violation[violation['violationdescription'].isin(violation_top20['violationdescription'])]
In [42]:
#join back with the tracts data
joined2 = gpd.sjoin(violation_top20_f, tracts_geo_u, op='within', how='left')
In [43]:
joined2 = joined2.groupby(['violationdescription','GEOID']).size().unstack(fill_value=0).stack().reset_index(name='N')
In [44]:
violation_join2 = tracts_geo_u.merge(joined2, on='GEOID')
violation_join2.head()
Out[44]:
GEOID geometry violationdescription N
0 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... ANNUAL CERT FIRE ALARM 55
1 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... CLIP VIOLATION NOTICE 5
2 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... CO DETECTOR NEEDED 0
3 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... EXT A-CLEAN RUBBISH/GARBAGE 4
4 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... EXT A-CLEAN WEEDS/PLANTS 0
In [45]:
violation_join2_plot = violation_join2.hvplot(c='N', 
                                              groupby='violationdescription', 
                                              width=600, 
                                              height=550, 
                                              geo=True, 
                                              dynamic=False).options(cmap='Magma')
violation_join2_plot
Out[45]:

The three violation types that don't seem to have much spatial overlap with the number of evictions in the city are:

  • high weeds-cut
  • license - rental property
  • viol c&i message

In these kinds of analyses, we get to understand more of the spatial pattern of the eviction data and the code violation data, as well as the correlation between both.